Overview

Welcome!

Welcome!

Report generated: November 02, 2021

Knit by: Jared Joseph

Partner Info

Partner DataLab
Department UC Davis Library
Nodes Concepts
Edges Conceptual Links


Objectvices: Showcase

Report Introduction

This report will ingest pre-formatted network data and walk readers through some preliminary network metrics. All of the text in this report is generalized to any network it could render; it is not specific to the data displayed. Because of this, some metrics may not be relevant to a specific data set. Please talk with the DataLab about the outcomes of this report for help with interpretation and applications of these outcomes.

To navigate this report, use the tabs at the top of the page. Each page will walk you through one network metric and it’s interpretation. The remainder of this tab will cover some network basics. If you have any questions, please contact us at datalab@ucdavis.edu.

Network Background

You may be familiar with tabular data; rows and columns containing information. You can see a representation of this data to the right on top. While this is a tidy way to store data, it artificially atomizes or separates many of the things we are interested in as researchers, social or otherwise. Network analysis is a tool to work with relational data, a.k.a. information about how entities are connected with each other. For example, the diagram on the bottom right shows the same data as the table above, with the added benefit of showing how these individuals are connected to each other. Hover over the people to reveal the data about them.

Rather than looking only at attributes of specific data points, we are looking at the connections between data. In network analysis, data points are called nodes or vertices, and the connections between them are called edges or ties. Vertices can be anything—people, places, words, concepts—they are usually mapped into rows in a data frame. Edges contain any information on how these things connect or are related to each other. These parts create a network or graph, defined as “finite set or sets of actors and the relation or relations defined on them” (Wasserman and Faust 1994).

Data Types

Example Data Frame

Person Name Age Widgets
J 30 1
Y 21 3
G 32 4
Z 48 8

Example Network

Network Metadata

Explanation Text

Network Characteristics

Network Property Measurement
Nodes 10
Edges 15
Components 1
Isolates NA
Directed
Weighted Edges
Density 0.333
Diameter 14
Degree Centralization 0.222
Betweenness Centralization 0.441
Eigenvector Centralization 0.384

Explanations

Overview

This tab contains some meta-information about the network. This includes the number of nodes and edges, as well as more specialized characteristics and measures. The rest of this space is dedicated to explaining those things in a general way. If you would like to see more information about specific nodes in the network, you can hover over them on the right to see a tooltip about that node. You can also find nodes sorted alphabetically in the top left pull-down menu.

Components & Isolates

In network analysis lone nodes with no edges are called isolates, while clusters of disconnected nodes are called components. This report works only on the largest single components from your dataset. If you have multiple components in your data, make sure this is what you expect. Given many network metrics are based on the distances between nodes, completely isolated nodes result in infinite values. If you are interested in multiple components, please pass them through the report individually.

Directed/Un-Directed

Networks can either be directed or un-directed. A directed network treats the edges between nodes as having a specific direction of flow, while an un-directed network considers all edges to be mutual. An example of each is presented below.

A directed network tracks which node is the source and which node is the receiver for an edge. Take for example the follow mechanic on Twitter. User A can follow User B, creating a directed edge from A to B, but B does not have to follow A in return. This can be useful when trying to understand the flows of resources that are finite such as money or goods.

An un-directed network treats all ties as mutual, such that A and B are both involved equally in a tie. An example is the friend mechanic on Facebook. Once a friendship is established, both users are considered equal in the tie. This can be helpful when you do not have information on what node initiates a tie, or when events happen equally to a group of nodes, such as all nodes being connected through co-membership in a group.

Which of these will be useful to you will likely change from project to project. However, it is vital to understand what kind of network you are working with, as many network calculations we will talk about later change their behavior based on if the network is directed or not.

Weighted Edges

Edges in most network are un-weighted, such that edges either exist or they do not. It is possible to have weighted edges in a network, such that some edges are considered more important than others. This is often used to represent multiple interactions. For example if your network is composed of people, an un-weighted edge might denote two individuals as friends, whereas a weighted edge may indicate how many time two individuals spend time together; in this situation an edge weight of 1 would be one meeting, while and edge weight of 5 would be five meetings.

Projected/Bipartite

Often, you will not have individual level network data, but you will have data on group membership. For example, if you wanted to map the social networks of student, but don’t know who they actually hang around with, you may be able to use class rosters to build an approximate network. This is call a bipartite network, two-mode, or projected network. You can “collapse” such a network into a student network by assuming every student connected to a class is connected to each other. The same is true with classes, such that classes are related to each other if a single student is enrolled in both. This assumption may not always be correct, and you need to take care if you are going to make it in your research. If a class has 300 students, it is most likely not correct to assume every student knows every other student in that class.

Density

Density is the first real graph level metric that helps you understand what is particular about the network you are looking at. The density of a network is a numerical score showing how many ties exist in a network, given the max possible in that network. Mathematically that is \(\frac{Actual Edges}{Possible Edges}\), where actual edges is the number of edges in the network, and possible edges is the number of edges if every single node in the network was connected to every other node.

Networks that are more densely connected are considered to be more cohesive and robust. This means that the removal of any specific edge or node will not have a great effect of the network as a whole. It also typically means that any one node in the network will be more likely to have access to whatever resources are in the network, as there are more potential connections in the network to search for resources.

Diameter

The Diameter of a network is the longest shortest path (geodiesic) in a network. Put another way, if you treated every node in a network as a starting point and tried to find the most distant other node in the network, that would be the diameter. The path between these two nodes are the longest “shortest path” because the path between them can’t include unnecicary diversions like loops or backtracking. The diameter can be a helpful measure of network size and spartsity. If you compare two networks, both with 25 nodes, a network with a diameter of 5 will be much more tightly connected than a network with a diameter of 15.

Centralization

Freeman Centralization (usually just called centralization) gives a sense of the shape of the network, namely how node level measures are distributed in a network. We’ll discuss node level measures next, but for now it is only important to understand that node level measures are numeric scores assigned to specific nodes rather than the network as a whole. This means that each node may have a different value.

Centralization is a measure of how unevenly node level metrics are distributed in a network. Scores closer to 0 mean the metric is evenly distributed, while scores closer to 1 indicate a concentration of the metric with a few nodes.

Network Viewer

Group: 1

Group: 2

Component Checker

Plot

Nodes not in the largest component

Degree

Degree Text

Degree Measure

Description

Degree counts how many edges are connected to a node. Degree gives a very rough measure of how popular or central a node is in the network. If a node has more ties, it may indicate that node as being more central or important in the network as a whole. Degree is a raw count of the number of edges a node has, this makes the interpretation of degree highly dependent on the size of the network. In a small network with only 25 total edges, having 10 of them would be significant. In a larger network with 250 total edges, 10 edges could be less impressive. Degree should thus be interpreted in the context of other nodes in the network.

Degree count can be affected if the network is un-directed or directed; this network is un-directed, so the following will not apply. If the network is directed there will be different counts for in-degree and out-degree. In-degree counts the number of edges a node is receiving, while out-degree counts the number of edges a node is sending. Total degree is the sum of these two numbers. If a node is receiving many edges but does not send any, this can be indicative of it’s role in the network, though the meaning of this pattern will be specific to each case.

In This Network

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
data 0 1 3 1.15 1 2.25 3 3.75 5

The network on the right has the nodes scaled such that larger nodes have a higher total degree. You can pan and zoom around the network to look at specific areas more closely. The node ID and total degree is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order.

This network is an un-directed network, meaning all edges are mutual, and has a total of 15 edges. The node highlighted in yellow in the right has the highest total degree at 5 (33.333% of all edges). The node(s) highlighted in grey has the lowest total degree at 1. The degree centralization of this network is 0.222; recall that values closer to 0 indicate an egalitarian network, while values closer to 1 indicate a centralized network.

Degree Plot

Network Viewer

Degree Network

Geodesic Distance

Geodesic Distance Text

Geodesic Distance Measure

Description

Geodesic Distance is “the length of the shortest path via the edges or binary connections between nodes” (Kadushin 2012). In other words, if we treat the network as a map we can move along, with the nodes being stopping places and the edges being paths, the geodesic is the shortest possible path we can use to walk between two nodes.

Nodes that on average have a shorter geodesic distance between all the other nodes in the network are considered to have have greater access to the resources in a network. This is because a node with a low average geodesic distance can theoretically “reach” the other nodes with less effort because it does not need to travel as far. This is our first instance of how network structure, not node attributes, can inform us about the nodes in a network. Essentially, looking at the network as a whole can tell us things about the people in it that is lost if we look only at individuals.

Note that while there is a correlation between degree counts and mean geodesic distance, one does not cause the other.

In This Network

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
data 0 1 7 1.86 4.89 5.33 6.67 8.25 9.78

The network on the right has the nodes scaled such that larger nodes have a lower mean geodesic distance. You can pan and zoom around the network to look at specific areas more closely. The node ID and geodesic distance is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order.

The mean geodesic distance in this network is 7. The node highlighted in yellow in the right has the lowest mean geodesic distance at 4.889; it will usually be centrally located in the network. The grey node has the highest at 9.778, and would need to travel through the whole network to reach the nodes on the opposite side.

Geodesic Distance Plot

Network Viewer

Geodesic Distance Network

Betweenness

Betweenness Text

Betweenness Measure

Description

Betweenness Centrality tries to calculate the extent to which a node acts as a gatekeeper or broker in the network. A broker bridges two otherwise disconnected segments in a network. If there are two parts of a network that would otherwise be broken apart if a node was removed, they would have a high betweenness centrality. The fragmenting of a network is not a prerequisite however, simply acting as an effective “shortcut” in a network can also raise a node’s betweenness centrality. Betweenness is calculated using geodesic distances, and gives a higher score to nodes that lie on more geodesic paths.

Centrality scores such as this are usually normalized such that their scores all sum to 1. This way, you can easily compare nodes within the network (but not between networks), and understand how nodes relate to each other structurally. It is possible for a node to have a 0 betweenness score if no geodesic distances pass through them.

In This Network

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
data 0 1 0.17 0.19 0 0.01 0.1 0.3 0.53

The network on the right has the nodes scaled such that larger nodes have a larger betweenness centrality. You can pan and zoom around the network to look at specific areas more closely. The node ID and betweenness centrality is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order.

The mean betweenness centrality in this network is 0.168. The node highlighted in gold on the right has the largest betweenness centrality at 0.528; this node likely acts as a gatekeeper in the network. Imagine this node were removed, how would resources need to flow differently in the network without it? The grey node has the smallest betweenness centrality at 0; the network would likely be unaffected by their removal in terms of resource flows.

Betweenness Plot

Betweenness Viewer

Betweenness Network

Eigenvector

Eigenvector Text

Eigenvector Measure

Description

Eigenvector Centrality is commonly known as a measure of “popular friends.” Rather than looking at the network position of a node, it looks at the network positions of nodes connected to it. Nodes with a high eigenvector score will be connected to nodes more prominent in the network. Nodes with low degree can have high eigenvector scores if they are connected to important nodes. In real life networks this can be interpreted as being close to influential others in a network.

In This Network

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
data 0 1 0.52 0.33 0.12 0.21 0.49 0.8 1

The network on the right has the nodes scaled such that larger nodes have a larger eigenvector centrality. You can pan and zoom around the network to look at specific areas more closely. The node ID and eigenvector centrality is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order.

The mean eigenvector centrality in this network is 0.518. The node highlighted in gold on the right has the largest eigenvector centrality at 1; this node may not be the most central node in the network, but it is connected to many important partners. The grey node has the smallest eigenvector centrality at 0.118.

This network has a eigenvector centralization of 0.384. Centralization is a measure of how unevenly node level metrics are distributed in a network. Scores closer to 0 mean the metric is evenly distributed, while scores closer to 1 indicate a concentration of the metric with a few nodes.

Eigenvector Plot

Eigenvector Viewer

Eigenvector Network

Clusters

Clusters Text

Clusters & Communities

Description

Most networks will have clusters of nodes within them. You may know of these clusters beforehand, or one of your questions may be to discover how nodes within your network group together. This tab shows some ways to detect clusters within your network. On the right are tabs for different clustering methods. Within each tab is a small description of how these clustering is performed, as well as plots for visualizing the clusters. The utility of these plots drops quickly as the network grows larger.

Note that clustering is significantly impacted by weighted edges. Make sure you have applied or excluded weighted edges in a way that makes sense for your project.

In This Network

This network has 3 clusters according to edge betweenness clustering, and 2 clusters according to label propagation.

Clusters Viewer

Edge Betweenness

Betweenness tries to find the nodes that act as gatekeepers or bridges in a network. By gradually removing the edges with the highest betweenness, we can hierarchically see what clusters of nodes are most strongly densely connected with each other. Because this method of clustering is hierarchical, a node will belong in several clusters.

Community Plot

Community Dendrogram

Label Propagation

Label Propagation: “In our algorithm every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form communities.” Label propagation is not a hierarchical method, meaning nodes here will only be counted in a single cluster.

Community Plot

Groups

Groups Text

Groups Header

Description

One of the most useful aspects of network analysis is the ability to compare how different groups are positioned within a network. If one group has a significantly higher average degree for example, they would be involved in, and thus control over, more of the network compared to other groups. The plot on the right shows the nodes in your network with colors representing each group. You should inspect the plot visually to see if nodes of specific groups cluster together in some way (though note the layout of the graph is arbitrary).

In This Network

group_1

group_1 - Membership: B

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
degree 0 1 3.50 1.29 2.00 2.75 3.50 4.25 5.00
mean_geodist 0 1 7.08 2.23 4.89 5.47 6.83 8.44 9.78
norm_betweenness 0 1 0.16 0.25 0.00 0.00 0.05 0.20 0.53
evc 0 1 0.67 0.38 0.20 0.46 0.75 0.97 1.00

group_1 - Membership: A

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
degree 0 1 2.50 1.00 1.00 2.50 3.00 3.00 3.00
mean_geodist 0 1 7.53 1.94 5.11 6.53 7.67 8.67 9.67
norm_betweenness 0 1 0.12 0.15 0.00 0.02 0.07 0.17 0.33
evc 0 1 0.43 0.30 0.21 0.21 0.32 0.54 0.85

group_1 - Membership: C

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
degree 0 1 3.00 1.41 2.00 2.50 3.00 3.50 4.00
mean_geodist 0 1 5.78 0.79 5.22 5.50 5.78 6.06 6.33
norm_betweenness 0 1 0.29 0.14 0.19 0.24 0.29 0.34 0.39
evc 0 1 0.39 0.38 0.12 0.25 0.39 0.52 0.66
group_2

group_2 - Membership: Y

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
degree 0 1 4.00 1.00 3.00 3.50 4.00 4.50 5.00
mean_geodist 0 1 6.78 2.63 4.89 5.28 5.67 7.72 9.78
norm_betweenness 0 1 0.21 0.28 0.00 0.05 0.10 0.31 0.53
evc 0 1 0.83 0.25 0.54 0.75 0.96 0.98 1.00

group_2 - Membership: X

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
degree 0 1 2.50 1.00 1.00 2.50 3.00 3.00 3.00
mean_geodist 0 1 7.53 1.94 5.11 6.53 7.67 8.67 9.67
norm_betweenness 0 1 0.12 0.15 0.00 0.02 0.07 0.17 0.33
evc 0 1 0.43 0.30 0.21 0.21 0.32 0.54 0.85

group_2 - Membership: Z

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
degree 0 1 2.67 1.15 2.00 2.00 2.00 3.00 4.00
mean_geodist 0 1 6.52 1.40 5.22 5.78 6.33 7.17 8.00
norm_betweenness 0 1 0.19 0.19 0.00 0.10 0.19 0.29 0.39
evc 0 1 0.33 0.29 0.12 0.16 0.20 0.43 0.66

This network has 4 group sets. You can see tables detailing the distribution of their network metrics above. If these group memberships have noticeably different distributions, it may be worth investigating why that is the case.

Groups Viewer

Group 1: Network

Group 2: Network

Credits

DataLab

About

This report was coded by Jared Joseph for the UC Davis DataLab; I hope you found it useful! If you have any questions, please send an email to the DataLab and we’ll get back to you!

Session Info

Here is the session info of the instance that generated this report:

sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.3.5       skimr_2.1.3         visNetwork_2.1.0   
[4] knitr_1.36          flexdashboard_0.5.2 igraph_1.2.7       

loaded via a namespace (and not attached):
 [1] sass_0.4.0         tidyr_1.1.4        jsonlite_1.7.2     carData_3.0-4     
 [5] bslib_0.3.1        assertthat_0.2.1   highr_0.9          emo_0.0.0.9000    
 [9] cellranger_1.1.0   yaml_2.2.1         pillar_1.6.4       backports_1.3.0   
[13] lattice_0.20-41    glue_1.4.2         digest_0.6.27      RColorBrewer_1.1-2
[17] ggsignif_0.6.3     colorspace_2.0-0   cowplot_1.1.1      htmltools_0.5.2   
[21] plyr_1.8.6         pkgconfig_2.0.3    broom_0.7.10       haven_2.4.3       
[25] purrr_0.3.4        scales_1.1.1       openxlsx_4.2.4     rio_0.5.27        
[29] tibble_3.1.2       generics_0.1.1     farver_2.1.0       car_3.0-11        
[33] ellipsis_0.3.2     ggpubr_0.4.0       withr_2.4.2        repr_1.1.3        
[37] magrittr_2.0.1     crayon_1.4.2       readxl_1.3.1       evaluate_0.14     
[41] fansi_0.4.2        nlme_3.1-152       rstatix_0.7.0      forcats_0.5.1     
[45] foreign_0.8-81     tools_4.0.4        data.table_1.14.0  hms_1.1.1         
[49] formatR_1.11       lifecycle_1.0.1    stringr_1.4.0      munsell_0.5.0     
[53] zip_2.2.0          compiler_4.0.4     jquerylib_0.1.4    rlang_0.4.11      
[57] grid_4.0.4         htmlwidgets_1.5.4  base64enc_0.1-3    labeling_0.4.2    
[61] rmarkdown_2.11     gtable_0.3.0       abind_1.4-5        DBI_1.1.1         
[65] curl_4.3.2         reshape2_1.4.4     R6_2.5.1           lubridate_1.8.0   
[69] dplyr_1.0.6        fastmap_1.1.0      utf8_1.2.1         ape_5.5           
[73] stringi_1.7.5      parallel_4.0.4     Rcpp_1.0.7        
 [ reached getOption("max.print") -- omitted 3 entries ]

References

References

Kadushin, Charles. 2012. Understanding Social Networks: Theories, Concepts, and Findings. New York, NY: Oxford University Press.
Wasserman, Stanley, and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge, UK: Cambridge University Press.
---
title: "Network Report"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source_code: embed
    logo: ../img/dl_logo.png
#    theme: 
#      version: 4
#      bootswatch: slate
#    includes:
#      after_body: footer.html
bibliography: ../references.bib
link-citations: TRUE
---

```{r setup, include=FALSE}
library(igraph)
library(flexdashboard)
library(knitr)
library(visNetwork)
library(skimr)
library(ggplot2)
# also need the emo package ( devtools::install_github("hadley/emo") )
# also need reshape2 package
# also need ggpubr package
# also need RColorBrewer
# also need webshot
# also oneed to webshot::install_phantomjs()

# options

## report author/ who knit this
.author = "Jared Joseph"

## Set a random seed
.seed = 1337

## load in network
rgraph = readRDS("./data/sample_net.rds")

# set color pal
.pal = RColorBrewer::brewer.pal(8, "Dark2")

# if you would like a custom column to show in node tooltips, please indicate that here
# the format should be a named list, with the title as the name.
# for example c("Title of Value" = "Column name in dataframe")

#custom_tooltips = list("Question" = "text", "Type" = "type")

# if you do not have any custom data, please include this

custom_tooltips = NA

# Parter Info

.partner_name = "DataLab"
.partner_department = "UC Davis Library"
.nodes_are = "Concepts"
.edges_are = "Conceptual Links"
.project_objectives = "Showcase"

# knitr options
options(max.print="75")
opts_chunk$set(echo=TRUE,
	             cache=FALSE,
               prompt=FALSE,
               tidy=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE,
               fig.retina = 3)
opts_knit$set(width=75)
```

```{r data_load, include=FALSE}
## is this network directed?
.directed = is.directed(rgraph)
## is this network multiplex?
.multiplex = is.weighted(rgraph)
## is there a group in the nodes data?
.groups = length(vertex_attr_names(rgraph)[grep("^group_\\d+$", vertex_attr_names(rgraph))]) > 0
# if there are groups, set color of nodes
if(.groups){
  for(group_iter in vertex_attr_names(rgraph)[grep("^group_\\d+$", vertex_attr_names(rgraph))]){
    rgraph = set_vertex_attr(rgraph, paste0(group_iter, "_color"), value = .pal[as.factor(eval(parse(text = paste0("V(rgraph)$", group_iter))))])
  }
}

# number of components
.component_num = components(rgraph)$no
# save total network
.bac_net = rgraph
```

```{r custom_errors, include=FALSE}
if(any(is.na(E(rgraph)$weight))){stop("The network is weighted, but there is an NA in weights. This will cause a fatal error later.")}
```

```{r data_edit, include=FALSE}
# largest component

## find ids of nodes in largest component
.components = igraph::clusters(rgraph, mode="weak")
.largest_cluster_id = which.max(.components$csize)
.largest_vert_ids <- V(rgraph)[.components$membership == .largest_cluster_id]

## get largest component
rgraph = igraph::induced_subgraph(rgraph, .largest_vert_ids)

# generate measures

## degree
rgraph = set_vertex_attr(rgraph, "degree", value = degree(rgraph, mode = "all"))
if(.directed){
  rgraph = set_vertex_attr(rgraph, "degree_in", value = degree(rgraph, mode = "in"))
  rgraph = set_vertex_attr(rgraph, "degree_out", value = degree(rgraph, mode = "out"))
}

## mean geodesic
# make distance table
.distable = distances(rgraph)
# replace inf and 0s (self) with NA for mean
.distable[is.infinite(.distable)] = NA
.distable[.distable == 0] = NA
# set values
rgraph = set_vertex_attr(rgraph, "mean_geodist", value = apply(.distable, 1, mean, na.rm = TRUE))

## normalized betweenness
rgraph = set_vertex_attr(rgraph, "norm_betweenness", value = betweenness(rgraph, directed = .directed, normalized = TRUE))

## eigenvector
rgraph = set_vertex_attr(rgraph, "evc", value = evcent(rgraph, directed = .directed)$vector)

## make visNetwork object for plotting
vnet = visNetwork::toVisNetworkData(rgraph)
# add edge weights
vnet$edges$width = if(.multiplex){E(rgraph)$weight} else {NULL}

# custom tooltips
## deal with custom tooltips if any set

if(!is.na(custom_tooltips)){
  
  ## take values from data
custom_tooltips = lapply(seq_along(custom_tooltips), FUN = function(i){
  
  eval(parse(text = paste0(
    "list('", names(custom_tooltips)[[i]], "' = vnet$nodes[,custom_tooltips[[i]]] )"
  )))
  
})

# unlist it
custom_tooltips = unlist(custom_tooltips, recursive = FALSE)

## format them as html
custom_tooltips = lapply(seq_along(custom_tooltips), function(i){
  
  paste0(
    "",
    "", names(custom_tooltips)[[i]], " ",
    "", custom_tooltips[[i]], "",
    ""
  )
  
  })
}

```

```{r data_checks, include=FALSE}
# if any of these are true, add in the warning box
.warm = any(
  # more than one component
  .component_num > 1,
  # total degree outliers
  length(boxplot.stats(vnet$nodes$degree)$out) > 0,
  # check if any nodes in edges missing from attributes
  !is.na(table(unique(c(vnet$edges$from, vnet$edges$to)) %in% vnet$nodes$id)["FALSE"])
  )
```

```{js, include=FALSE}
document.getElementsByClassName('navbar-logo')[0].onclick = function(){
   location.href= 'https://datalab.ucdavis.edu/'
}
```

```{r img_links, echo=FALSE}
# emojis
.net_url1 = "../img/male-teacher_emoji.png"
.net_url2 = "../img/female-student_emoji.png"
.net_url3 = "../img/female-health-worker_emoji.png"
.net_url4 = "../img/male-scientist_emoji.png"
```

Overview
================================================================================

Welcome! {data-width=400}
--------------------------------------------------------------------------------

### Welcome!

Report generated: `r format(Sys.time(), '%B %d, %Y')`

Knit by: `r paste(.author)`

#### Partner Info

|            |                                 |
|------------|--------------------------------:|
| Partner    | `r paste(.partner_name)`        |
| Department | `r paste(.partner_department)`  |
| Nodes      | `r paste(.nodes_are)`           |
| Edges      | `r paste(.edges_are)`           |


**Objectvices:** `r paste(.project_objectives)` #### Report Introduction This report will ingest pre-formatted network data and walk readers through some preliminary network metrics. All of the text in this report is generalized to any network it could render; it is not specific to the data displayed. Because of this, some metrics may not be relevant to a specific data set. Please talk with the DataLab about the outcomes of this report for help with interpretation and applications of these outcomes. To navigate this report, use the tabs at the top of the page. Each page will walk you through one network metric and it's interpretation. The remainder of this tab will cover some network basics. If you have any questions, please contact us at datalab@ucdavis.edu. #### Network Background You may be familiar with tabular data; rows and columns containing information. You can see a representation of this data to the right on top. While this is a tidy way to store data, it artificially atomizes or separates many of the things we are interested in as researchers, social or otherwise. Network analysis is a tool to work with *relational* data, a.k.a. information about how entities are connected with each other. For example, the diagram on the bottom right shows the same data as the table above, with the added benefit of showing how these individuals are connected to each other. Hover over the people to reveal the data about them. Rather than looking only at attributes of specific data points, we are looking at the connections between data. In network analysis, data points are called *nodes* or *vertices*, and the connections between them are called *edges* or *ties*. Vertices can be anything---people, places, words, concepts---they are usually mapped into rows in a data frame. Edges contain any information on how these things connect or are related to each other. These parts create a *network* or *graph*, defined as "finite set or sets of actors and the relation or relations defined on them" [@wassermanSocialNetworkAnalysis1994]. Data Types {data-width=600} -------------------------------------------------------------------------------- ### Example Data Frame | Person | Name | Age | Widgets | |-|:-|:-|:-| | ![](`r .net_url1`){width="10%"} | J | 30 | 1 | | ![](`r .net_url2`){width="10%"} | Y | 21 | 3 | | ![](`r .net_url3`){width="10%"} | G | 32 | 4 | | ![](`r .net_url4`){width="10%"} | Z | 48 | 8 | ### Example Network ```{r toy_net, echo=FALSE, out.width='100%'} .toy_nodes = data.frame(id = 1:4, shape = "image", title = c("

Name: J
Age: 30
Widgets: 1

", "

Name: Y
Age: 21
Widgets: 3

", "

Name: G
Age: 32
Widgets: 4

", "

Name: Z
Age: 48
Widgets: 8

"), image = c(.net_url1, .net_url2, .net_url3, .net_url4)) .toy_edges = data.frame(from = c(2,4,3,3), to = c(1,2,4,2), label = c("Siblings", "Student", "Friends", "Parent")) .toy_net = visNetwork(.toy_nodes, .toy_edges, width = "100%") %>% visNodes(shapeProperties = list(useBorderWithImage = FALSE), size = 50) %>% visEdges(length = 200, scaling = list(min = 400)) %>% visInteraction(zoomView = FALSE) %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) .toy_net$sizingPolicy$browser$fill = TRUE .toy_net ``` Network Metadata ================================================================================ Explanation Text -------------------------------------------------------------------------------- ### Network Characteristics {data-height=320} | Network Property | Measurement | |:-|:-| | Nodes | `r vcount(rgraph)` | | Edges | `r ecount(rgraph)` | | Components | `r ifelse(components(.bac_net)$no == 1, 1, paste0("", components(.bac_net)$no, ""))` | | Isolates | `r ifelse(as.numeric(table(degree(.bac_net) == 0)["TRUE"]) == 0, 0, paste0("", as.numeric(table(degree(.bac_net) == 0)["TRUE"]), ""))` | | Directed | `r ifelse(.directed, emo::ji("check"), emo::ji("x"))` | | Weighted Edges| `r ifelse(.multiplex, emo::ji("check"), emo::ji("x"))` | | Density | `r round(edge_density(rgraph), digits = 3)` | | Diameter | `r diameter(rgraph)` | | Degree Centralization | `r round(centralization.degree(rgraph, mode = "all", normalized = TRUE)$centralization, digits = 3)` | | Betweenness Centralization | `r round(centralization.betweenness(rgraph, directed = .directed, normalized = TRUE)$centralization, digits = 3)` | | Eigenvector Centralization | `r round(centralization.evcent(rgraph, directed = .directed, normalized = TRUE)$centralization, digits = 3)` | ### Explanations `r if(!.warm) {""}` #### Overview This tab contains some meta-information about the network. This includes the number of nodes and edges, as well as more specialized characteristics and measures. The rest of this space is dedicated to explaining those things in a general way. If you would like to see more information about specific nodes in the network, you can hover over them on the right to see a tooltip about that node. You can also find nodes sorted alphabetically in the top left pull-down menu. #### Components & Isolates In network analysis lone nodes with no edges are called isolates, while clusters of disconnected nodes are called components. *This report works only on the largest single components from your dataset.* If you have multiple components in your data, make sure this is what you expect. Given many network metrics are based on the distances between nodes, completely isolated nodes result in infinite values. If you are interested in multiple components, please pass them through the report individually. #### Directed/Un-Directed Networks can either be *directed* or *un-directed*. A directed network treats the edges between nodes as having a specific direction of flow, while an un-directed network considers all edges to be mutual. An example of each is presented below. A directed network tracks which node is the source and which node is the receiver for an edge. Take for example the *follow* mechanic on Twitter. User A can follow User B, creating a directed edge from A to B, but B does not have to follow A in return. This can be useful when trying to understand the flows of resources that are finite such as money or goods. An un-directed network treats all ties as mutual, such that A and B are both involved equally in a tie. An example is the *friend* mechanic on Facebook. Once a friendship is established, both users are considered equal in the tie. This can be helpful when you do not have information on what node initiates a tie, or when events happen equally to a group of nodes, such as all nodes being connected through co-membership in a group. Which of these will be useful to you will likely change from project to project. However, it is vital to understand what kind of network you are working with, as many network calculations we will talk about later change their behavior based on if the network is directed or not. #### Weighted Edges Edges in most network are un-weighted, such that edges either exist or they do not. It is possible to have weighted edges in a network, such that some edges are considered more important than others. This is often used to represent multiple interactions. For example if your network is composed of people, an un-weighted edge might denote two individuals as friends, whereas a weighted edge may indicate how many time two individuals spend time together; in this situation an edge weight of 1 would be one meeting, while and edge weight of 5 would be five meetings. #### Projected/Bipartite Often, you will not have individual level network data, but you will have data on group membership. For example, if you wanted to map the social networks of student, but don't know who they actually hang around with, you may be able to use class rosters to build an approximate network. This is call a *bipartite network*, *two-mode*, or *projected network*. You can "collapse" such a network into a student network by assuming every student connected to a class is connected to each other. The same is true with classes, such that classes are related to each other if a single student is enrolled in both. This assumption may not always be correct, and you need to take care if you are going to make it in your research. If a class has 300 students, it is most likely not correct to assume every student knows every other student in that class. #### Density *Density* is the first real graph level metric that helps you understand what is particular about the network you are looking at. The density of a network is a numerical score showing how many ties exist in a network, given the max possible in that network. Mathematically that is $\frac{Actual Edges}{Possible Edges}$, where actual edges is the number of edges in the network, and possible edges is the number of edges if every single node in the network was connected to every other node. Networks that are more densely connected are considered to be more cohesive and robust. This means that the removal of any specific edge or node will not have a great effect of the network as a whole. It also typically means that any one node in the network will be more likely to have access to whatever resources are in the network, as there are more potential connections in the network to search for resources. #### Diameter The *Diameter* of a network is the longest shortest path (geodiesic) in a network. Put another way, if you treated every node in a network as a starting point and tried to find the most distant other node in the network, that would be the diameter. The path between these two nodes are the longest "shortest path" because the path between them can't include unnecicary diversions like loops or backtracking. The diameter can be a helpful measure of network size and spartsity. If you compare two networks, both with 25 nodes, a network with a diameter of 5 will be much more tightly connected than a network with a diameter of 15. #### Centralization *Freeman Centralization* (usually just called *centralization*) gives a sense of the shape of the network, namely how node level measures are distributed in a network. We'll discuss node level measures next, but for now it is only important to understand that node level measures are numeric scores assigned to specific nodes rather than the network as a whole. This means that each node may have a different value. Centralization is a measure of how unevenly node level metrics are distributed in a network. Scores closer to 0 mean the metric is evenly distributed, while scores closer to 1 indicate a concentration of the metric with a few nodes. Network Viewer {.tabset} -------------------------------------------------------------------------------- ```{r overview_net_gen, echo=FALSE} # make list for multiple networks if multiple groups net_list = list() # if there are no groups do this if(is.null(V(rgraph)$group_1)){ overview_net = vnet$nodes overview_net$title = paste0("", "", "", "", "", "", "", "", "", "", "", "", "", if(.directed){paste0( "", "", "", "", "", "", "", "") }, "", "", "", "", "", "", "", "", "", "", "", "", if(all(!is.na(custom_tooltips))){ apply(data.frame(custom_tooltips, stringsAsFactors = FALSE), 1, FUN = function(trow){paste(trow, collapse = " ")}) }, "
MeasurementValue
Node ID ", overview_net$id, "
Total Degree ", round(overview_net$degree, digits = 3), "
Total In Degree ", round(overview_net$degree_in, digits = 3), "
Total Out Degree ", round(overview_net$degree_out, digits = 3), "
Mean Geodesic Distance ", round(overview_net$mean_geodist, digits = 3), "
Norm Betweenness Centrality ", round(overview_net$norm_betweenness, digits = 3), "
Eigenvector Centrality ", round(overview_net$evc, digits = 3), "
") net_list[["no_group"]] = visNetwork(overview_net, vnet$edges) %>% visInteraction(zoomView = TRUE, dragView = TRUE) %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) } else { # if there are groups do this for(group_iter in vertex_attr_names(rgraph)[grep("^group_\\d+$", vertex_attr_names(rgraph))]){ overview_net = vnet$nodes eval(parse(text = paste0("overview_net$group = vnet$nodes$", group_iter))) eval(parse(text = paste0("overview_net$color = vnet$nodes$", group_iter, "_color"))) overview_net$title = paste0("", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", if(.directed){paste0( "", "", "", "", "", "", "", "") }, "", "", "", "", "", "", "", "", "", "", "", "", if(all(!is.na(custom_tooltips))){ apply(data.frame(custom_tooltips, stringsAsFactors = FALSE), 1, FUN = function(trow){paste(trow, collapse = " ")}) }, "
MeasurementValue
Node ID ", overview_net$id, "
Group ", overview_net$group, "
Total Degree ", round(overview_net$degree, digits = 3), "
In Degree ", round(overview_net$degree_in, digits = 3), "
Out Degree ", round(overview_net$degree_out, digits = 3), "
Mean Geodesic Distance ", round(overview_net$mean_geodist, digits = 3), "
Norm Betweenness Centrality ", round(overview_net$norm_betweenness, digits = 3), "
Eigenvector Centrality ", round(overview_net$evc, digits = 3), "
") net_list[[group_iter]] = visNetwork(overview_net, vnet$edges) %>% visInteraction(zoomView = TRUE, dragView = TRUE) %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE, selectedBy = "group") %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) } } ``` ```{r overview_net_display, echo=FALSE, results='asis'} .group_iter = 1 for(net in net_list){ cat(paste0("### Group: ", .group_iter, "\n")) .group_iter = .group_iter + 1 cat(knitr::knit_print(net)) } ``` ### Component Checker #### Plot ```{r component_check, echo=FALSE} plot(.bac_net, layout = layout_with_drl(.bac_net)) ``` #### Nodes not in the largest component `r paste(unique(unlist(groups(igraph::clusters(.bac_net, mode="weak"))[-which.max(igraph::clusters(.bac_net, mode="weak")$csize)])))` Degree ================================================================================ Degree Text -------------------------------------------------------------------------------- ### Degree Measure #### Description **Degree** counts how many edges are connected to a node. Degree gives a very rough measure of how popular or central a node is in the network. If a node has more ties, it may indicate that node as being more central or important in the network as a whole. Degree is a raw count of the number of edges a node has, this makes the interpretation of degree highly dependent on the size of the network. In a small network with only 25 total edges, having 10 of them would be significant. In a larger network with 250 total edges, 10 edges could be less impressive. Degree should thus be interpreted in the context of other nodes in the network. Degree count can be affected if the network is un-directed or directed; this network is `r ifelse(.directed, paste0("**directed**, so the following will apply"), paste0("**un-directed**, so the following will not apply"))`. If the network is directed there will be different counts for in-degree and out-degree. In-degree counts the number of edges a node is receiving, while out-degree counts the number of edges a node is sending. Total degree is the sum of these two numbers. If a node is receiving many edges but does not send any, this can be indicative of it's role in the network, though the meaning of this pattern will be specific to each case. #### In This Network ```{r degree_skim, echo=FALSE} skim_without_charts(vnet$nodes$degree) %>% yank("numeric") if(.directed){ skim_without_charts(vnet$nodes$degree_in) %>% yank("numeric") skim_without_charts(vnet$nodes$degree_out) %>% yank("numeric") } ``` `r if(.directed) {""}` `r if(!.directed) {""}` #### Degree Plot ```{r degree_hist, echo=FALSE, out.width="100%"} ggplot(data = vnet$nodes, aes(x = degree)) + geom_histogram() + theme_minimal() + xlab("Total Degree") + labs(title = "Node Total Degree") if(.directed){ ggplot(data = vnet$nodes, aes(x = degree_in)) + geom_histogram() + theme_minimal() + xlab("In Degree") + labs(title = "Node In Degree") ggplot(data = vnet$nodes, aes(x = degree_out)) + geom_histogram() + theme_minimal() + xlab("Out Degree") + labs(title = "Node Out Degree") } ``` Network Viewer -------------------------------------------------------------------------------- ### Degree Network ```{r degree_net, echo=FALSE} # make dataframe for vis of degree degree_df = vnet$nodes degree_df$value = degree_df$degree ^ 2 degree_df$color = "#355B85" # add color for largest node degree_df[which(max(degree_df$degree) == degree_df$degree), "color"] = "#FFBF00" degree_df[which(min(degree_df$degree) == degree_df$degree), "color"] = "#CDD6E0" # add label degree_df$label = paste0(degree_df$id, "\n", round(degree_df$degree, 3)) visNetwork(degree_df, vnet$edges) %>% visInteraction(zoomView = TRUE, dragView = TRUE) %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) ``` Geodesic Distance ================================================================================ Geodesic Distance Text -------------------------------------------------------------------------------- ### Geodesic Distance Measure #### Description **Geodesic Distance** is "the length of the shortest path via the edges or binary connections between nodes" [@kadushinUnderstandingSocialNetworks2012]. In other words, if we treat the network as a map we can move along, with the nodes being stopping places and the edges being paths, the geodesic is the shortest possible path we can use to walk between two nodes. Nodes that on average have a shorter geodesic distance between all the other nodes in the network are considered to have have greater access to the resources in a network. This is because a node with a low average geodesic distance can theoretically "reach" the other nodes with less effort because it does not need to travel as far. This is our first instance of how network structure, not node attributes, can inform us about the nodes in a network. Essentially, looking at the network as a whole can tell us things about the people in it that is lost if we look only at individuals. Note that while there is a correlation between degree counts and mean geodesic distance, one does not cause the other. #### In This Network ```{r geodesic_skim, echo=FALSE} skim_without_charts(vnet$nodes$mean_geodist) %>% yank("numeric") ``` The network on the right has the nodes scaled such that larger nodes have a **lower** mean geodesic distance. You can pan and zoom around the network to look at specific areas more closely. The node ID and geodesic distance is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order. The mean geodesic distance in this network is `r round(mean(vnet$nodes$mean_geodist), digits = 3)`. The node highlighted in yellow in the right has the lowest mean geodesic distance at `r round(min(vnet$nodes$mean_geodist), digits = 3)`; it will usually be centrally located in the network. The grey node has the highest at `r round(max(vnet$nodes$mean_geodist), digits = 3)`, and would need to travel through the whole network to reach the nodes on the opposite side. #### Geodesic Distance Plot ```{r geodesic_hist, echo=FALSE, out.width="100%"} ggplot(data = vnet$nodes, aes(x = mean_geodist)) + geom_histogram() + theme_minimal() + xlab("Geodesic Distance") + labs(title = "Node Geodesic Distance") ``` Network Viewer -------------------------------------------------------------------------------- ### Geodesic Distance Network ```{r Geodesic_net, echo=FALSE, message=FALSE, warning=FALSE} # make df to vis geodesic distances gdist_df = vnet$nodes gdist_df$value = (gdist_df$mean_geodist * -1) gdist_df$color = "#355B85" # replace min average geodesic with gold, max with grey gdist_df$color[which(gdist_df$mean_geodist == min(gdist_df$mean_geodist))] = "#FFBF00" gdist_df$color[which(gdist_df$mean_geodist == max(gdist_df$mean_geodist))] = "#CDD6E0" # add label as geodesic distance, rounding to 3 digits gdist_df$label = paste0(gdist_df$id, "\n", round(gdist_df$mean_geodist, 3)) # plot visNetwork(gdist_df, vnet$edges) %>% visInteraction(zoomView = TRUE, dragView = TRUE) %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) ``` Betweenness ================================================================================ Betweenness Text -------------------------------------------------------------------------------- ### Betweenness Measure #### Description **Betweenness Centrality** tries to calculate the extent to which a node acts as a gatekeeper or *broker* in the network. A broker bridges two otherwise disconnected segments in a network. If there are two parts of a network that would otherwise be broken apart if a node was removed, they would have a high betweenness centrality. The fragmenting of a network is not a prerequisite however, simply acting as an effective "shortcut" in a network can also raise a node's betweenness centrality. Betweenness is calculated using geodesic distances, and gives a higher score to nodes that lie on more geodesic paths. Centrality scores such as this are usually normalized such that their scores all sum to 1. This way, you can easily compare nodes within the network (but not between networks), and understand how nodes relate to each other structurally. It is possible for a node to have a 0 betweenness score if no geodesic distances pass through them. #### In This Network ```{r betweenness_skim, echo=FALSE} skim_without_charts(vnet$nodes$norm_betweenness) %>% yank("numeric") ``` The network on the right has the nodes scaled such that larger nodes have a larger betweenness centrality. You can pan and zoom around the network to look at specific areas more closely. The node ID and betweenness centrality is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order. The mean betweenness centrality in this network is `r round(mean(vnet$nodes$norm_betweenness), digits = 3)`. The node highlighted in gold on the right has the largest betweenness centrality at `r round(max(vnet$nodes$norm_betweenness), digits = 3)`; this node likely acts as a gatekeeper in the network. Imagine this node were removed, how would resources need to flow differently in the network without it? The grey node has the smallest betweenness centrality at `r round(min(vnet$nodes$norm_betweenness), digits = 3)`; the network would likely be unaffected by their removal in terms of resource flows. #### Betweenness Plot ```{r betweenness_hist, echo=FALSE, out.width="100%"} ggplot(data = vnet$nodes, aes(x = norm_betweenness)) + geom_histogram() + theme_minimal() + xlab("Node Betweenness") + labs(title = "Node Normalized Betweenness") ``` Betweenness Viewer -------------------------------------------------------------------------------- ### Betweenness Network ```{r betweenness_net, echo=FALSE, message=FALSE, warning=FALSE} bet_df = vnet$nodes bet_df$value = bet_df$norm_betweenness bet_df$color = "#355B85" # add label as geodesic distance, rounding to 3 digits bet_df$label = paste0(bet_df$id, "\n", round(bet_df$norm_betweenness, 3)) # replace max average geodesic with gold, min with grey bet_df$color[which(bet_df$norm_betweenness == max(bet_df$norm_betweenness))] = "#FFBF00" bet_df$color[which(bet_df$norm_betweenness == min(bet_df$norm_betweenness))] = "#CDD6E0" # plot visNetwork(bet_df, vnet$edges) %>% visInteraction(zoomView = TRUE, dragView = TRUE) %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) ``` Eigenvector ================================================================================ Eigenvector Text -------------------------------------------------------------------------------- ### Eigenvector Measure #### Description **Eigenvector Centrality** is commonly known as a measure of "popular friends." Rather than looking at the network position of a node, it looks at the network positions of nodes connected to it. Nodes with a high eigenvector score will be connected to nodes more prominent in the network. Nodes with low degree can have high eigenvector scores if they are connected to important nodes. In real life networks this can be interpreted as being close to influential others in a network. #### In This Network ```{r evc_skim, echo=FALSE} skim_without_charts(vnet$nodes$evc) %>% yank("numeric") ``` The network on the right has the nodes scaled such that larger nodes have a larger eigenvector centrality. You can pan and zoom around the network to look at specific areas more closely. The node ID and eigenvector centrality is displayed below a node. You can also use the drop-down in the upper left to find a specific node; nodes are in alphabetical order. The mean eigenvector centrality in this network is `r round(mean(vnet$nodes$evc), digits = 3)`. The node highlighted in gold on the right has the largest eigenvector centrality at `r round(max(vnet$nodes$evc), digits = 3)`; this node may not be the most central node in the network, but it is connected to many important partners. The grey node has the smallest eigenvector centrality at `r round(min(vnet$nodes$evc), digits = 3)`. This network has a eigenvector centralization of `r round(centralization.evcent(rgraph, directed = .directed, normalized = TRUE)$centralization, digits = 3)`. Centralization is a measure of how unevenly node level metrics are distributed in a network. Scores closer to 0 mean the metric is evenly distributed, while scores closer to 1 indicate a concentration of the metric with a few nodes. #### Eigenvector Plot ```{r evc_hist, echo=FALSE, out.width="100%"} ggplot(data = vnet$nodes, aes(x = evc)) + geom_histogram() + theme_minimal() + xlab("Node Eigenvector") + labs(title = "Node Eigenvector Betweenness") ``` Eigenvector Viewer -------------------------------------------------------------------------------- ### Eigenvector Network ```{r eigenvector_net, echo=FALSE, message=FALSE, warning=FALSE} evc_df = vnet$nodes evc_df$value = evc_df$evc evc_df$color = "#355B85" # add label as geodesic distance, rounding to 3 digits evc_df$label = paste0(evc_df$id, "\n", round(evc_df$evc, 3)) # replace min average geodesic with gold, min with grey evc_df$color[which(evc_df$evc == max(evc_df$evc))] = "#FFBF00" evc_df$color[which(evc_df$evc == min(evc_df$evc))] = "#CDD6E0" # plot visNetwork(evc_df, vnet$edges) %>% visInteraction(zoomView = TRUE, dragView = TRUE) %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% visIgraphLayout(physics = TRUE, randomSeed = .seed) ``` Clusters ================================================================================ ```{r cluster_label_calc, echo=FALSE} bet_clust = cluster_edge_betweenness(rgraph, weights = if(.multiplex){E(rgraph)$weight} else {NULL}, directed = .directed) lab_clust = cluster_label_prop(rgraph, weights = if(.multiplex){E(rgraph)$weight} else {NA}) ``` Clusters Text -------------------------------------------------------------------------------- ### Clusters & Communities #### Description Most networks will have clusters of nodes within them. You may know of these clusters beforehand, or one of your questions may be to discover how nodes within your network group together. This tab shows some ways to detect clusters within your network. On the right are tabs for different clustering methods. Within each tab is a small description of how these clustering is performed, as well as plots for visualizing the clusters. The utility of these plots drops quickly as the network grows larger. Note that clustering is significantly impacted by weighted edges. Make sure you have applied or excluded weighted edges in a way that makes sense for your project. #### In This Network This network has `r length(bet_clust[])` clusters according to edge betweenness clustering, and `r length(lab_clust[])` clusters according to label propagation. Clusters Viewer {.tabset} -------------------------------------------------------------------------------- ### Edge Betweenness Betweenness tries to find the nodes that act as gatekeepers or bridges in a network. By gradually removing the edges with the highest betweenness, we can hierarchically see what clusters of nodes are most strongly densely connected with each other. Because this method of clustering is hierarchical, a node will belong in several clusters. #### Community Plot ```{r cluster_edge_plot, echo=FALSE, out.width="100%"} plot(bet_clust, rgraph, edge.width = if(.multiplex){E(rgraph)$weight} else {NULL}) ``` #### Community Dendrogram ```{r cluster_edge_dendo, echo=FALSE, out.width="100%"} plot_dendrogram(bet_clust) ``` ### Label Propagation Label Propagation: "In our algorithm every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form communities." Label propagation is not a hierarchical method, meaning nodes here will only be counted in a single cluster. #### Community Plot ```{r cluster_label_plot, echo=FALSE, out.width="100%"} plot(lab_clust, rgraph, edge.width = if(.multiplex){E(rgraph)$weight} else {NULL}, rescale = TRUE) ``` `r if(!.groups) {""}` Credits ================================================================================ DataLab {data-width=600} -------------------------------------------------------------------------------- ### About This report was coded by [Jared Joseph](https://jnjoseph.com/) for the [UC Davis DataLab](https://datalab.ucdavis.edu/); I hope you found it useful! If you have any questions, [please send an email to the DataLab](mailto:datalab@ucdavis.edu) and we'll get back to you! ### Session Info Here is the session info of the instance that generated this report: ```{r} sessionInfo() ``` References {data-width=400} -------------------------------------------------------------------------------- ### References